ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
نویسنده
چکیده
Yi i.i.d. ∼ pθ∗ , i = {1, . . . , n} where θ∗ ∈ Θ. We can view pθ∗ as a member of a parametric class of distributions, P = {pθ}θ∈Θ. Our goal is to use the observations {Yi} to select an appropriate distribution (e.g., model) from P. We would like the selected distribution to be close to pθ in some sense. We use the negative log-likelihood loss function, defined as l(θ, Yi) = − log pθ(Yi). The empirical risk is R̂n(θ) = − 1 n n ∑
منابع مشابه
ECE 901 Lecture 15: Denoising Smooth Functions with Unknown Smoothness
Lipschitz functions are interesting, but can be very rough (these can have many kinks). In many situations the functions can be much smoother. This is how you would model the temperature inside a museum room for example. Often we don’t know how smooth the function might be, so an interesting question is if we can adapt to the unknown smoothness. In this lecture we will use the Maximum Complexit...
متن کاملELEN6887 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization
Yi i.i.d. ∼ pθ∗ , i = {1, . . . , n} where θ∗ ∈ Θ. We can view pθ∗ as a member of a parametric class of distributions, P = {pθ}θ∈Θ. Our goal is to use the observations {Yi} to select an appropriate distribution (e.g., model) from P. We would like the selected distribution to be close to pθ in some sense. We use the negative log-likelihood loss function, defined as l(θ, Yi) = − log pθ(Yi). The e...
متن کاملELEN6887 Lecture 13: Maximum Likelihood Estimation and Complexity Regularization
Yi i.i.d. ∼ pθ∗ , i = {1, . . . , n} where θ∗ ∈ Θ. We can view pθ∗ as a member of a parametric class of distributions, P = {pθ}θ∈Θ. Our goal is to use the observations {Yi} to select an appropriate distribution (e.g., model) from P. We would like the selected distribution to be close to pθ in some sense. We use the negative log-likelihood loss function, defined as l(θ, Yi) = − log pθ(Yi). The e...
متن کاملMaximum Likelihood Estimation
1 Summary of Lecture 12 In the last lecture we derived a risk (MSE) bound for regression problems; i.e., select an f ∈ F so that E[(f(X)− Y )]− E[(f∗(X)− Y )] is small, where f∗(x) = E[Y |X = x]. The result is summarized below. Theorem 1 (Complexity Regularization with Squared Error Loss) Let X = R, Y = [−b/2, b/2], {Xi, Yi}i=1 iid, PXY unknown, F = {collection of candidate functions}, f : R → ...
متن کاملELEN6887 Lecture 14: Denoising Smooth Functions with Unknown Smoothness
Lipschitz functions are interesting, but can be very rough (these can have many kinks). In many situations the functions can be much smoother. This is how you would model the temperature inside a museum room for example. Often we don’t know how smooth the function might be, so an interesting question is if we can adapt to the unknown smoothness. In this lecture we will use the Maximum Complexit...
متن کامل